The Iterative Homogeneity of Variance Index 1 Running head: ESTIMATING HETEROGENEITY IN META-ANALYSIS The Iterative Homogeneity of Variance Index: Improving Negative Variance Estimates in Meta- Analysis

نویسندگان

  • Piers D. Steel
  • John D. Kammeyer-Mueller
چکیده

Determining the variability of observed relationships is a critical step in quantitative research synthesis, requiring the estimation of 2 ˆ ρ σ , variance due to moderator effects. When observed variance is less than expected by sampling error, the estimated residual variance is negative, which most traditional techniques recode to zero. We review an alternative technique for estimating 2 ˆ ρ σ that avoids this recoding, and considers the entire continuum of source populations that could give rise to an observed variance. When tested in simulated meta-analytic datasets, results show the revised technique for estimating 2 ˆ ρ σ outperforms the traditional method overall and under most conditions. Importantly, this suggests that heterogeneity of variance in research results has likely been understated in many smaller and moderately sized meta-analyses. The Iterative Homogeneity of Variance Index 3 THE ITERATIVE HOMOGENEITY OF VARIANCE INDEX A fundamental scientific question is the generalizability, transportability, or situational specificity of results. Researchers are often interested in questions such as, “will our treatment work for these people?” or “will our predictions hold true given this context?” If an observed relationship does generalize across populations, it means that the implications of the findings can be applied to a wide range of practical applications and researchers can direct their attention to new problems beyond replication. Conversely, if an observed relationship fails to generalize, it indicates a need for further research that explores when and how relationships might occur or not occur. The primary goal of many meta-analyses is the investigation of variability of the population correlation, so much so that one of the most widely followed methods of metaanalysis is sometimes called va lidity generalization (Hunter & Schmidt, 1990). In meta-analysis, homogeneity tests or estimates of the proportion of variance accounted for by statistical artifacts are frequently used to determine if significant situational variance exists (Cortina, in press). Establishing homogeneity of variance typically requires the estimation of the component 2 ˆ ρ σ (also referred to as 2 ˆτ σ or 2 τ̂ ), which represents the variance above that expected by sampling error alone. As observed variability in a set of studies becomes greater than expected by sampling error, the possibility increases that important moderators of the observed relationship exist. In terms of a moderator analysis, as the percentage of total variance unexplained by sampling error rises above 0%, real heterogeneity in relationships across various contexts becomes increasingly likely. The practical utility of expressing moderator variance in terms of total percentage of observed variance is that it enables us to represent variability across studies in a common metric regardless of the underlying population correlation level or sample size. The Iterative Homogeneity of Variance Index 4 In considering the value of an estimator of residual variance, it is worth considering what type of estimator is desirable. An appropriate estimate of the population variance should have low bias and high efficiency (Nunally & Bernstein, 1994; Rice, 1995). An estimator with low bias gives estimates that, on average, are close to the actual population value. An efficient estimator consistently gives the same values for multiple samples from the same population, meaning that the variance of the estimates is low. In the present context, this means that a low bias estimator of the residual variance will, on average, come close to the true population variance, while an efficient estimator will give similar estimates of the population variance across multiple samples from the same population. As we shall see, the traditional method for estimating residual variance falls short on these counts in many cases, suggesting a need for alternatives. The traditional solution Estimates of 2 ˆ ρ σ have been established through several techniques which share in common a subtraction of expected variance due to sampling error from observed variance (e.g., Hedges & Vevea, 1998; Hunter & Schmidt, 1990; Overton, 1998). At first blush, it would seem that any residual variance above and beyond sampling error will represent the true population variance, but there is a significant problem with this approach. These traditional methods for estimating 2 ˆ ρ σ often produce negative estimates of residual variance. This happens when observed variance is less than expected by sampling error alone. Since negative variance is impossible, researchers resolve this problem by recoding negative residual variance estimates as zero and concluding that no proportion of the observed variability in studies can be explained by moderators (e.g., Hall & Brannick, 2002; Hardy & Thompson, 1998; Overton, 1998). The Iterative Homogeneity of Variance Index 5 Unfortunately, this adjustment comes at a cost, since it may result in the classification of variance as non-existent when there is in fact real variance in the population. This is because the observed variance is based on a finite sample, and is therefore prone to second-order sampling error (i.e., even if each study in a meta-analysis is errorfree, the outcome still depends on what particular studies just happen to be available). Consider Figure 1, which shows the distribution of observed variances for two populations, one where 2 ρ σ =0% and another where 2 ρ σ =25% of the total variance in meta-analytic samples with K=20, 2 ρ σ =100, and ρ =.20. As can be seen, a substantive portion of the observed distribution for both populations lies below expected sampling error. This is not a problem when 2 ρ σ =0%, because recoding the estimates to zero will give a correct estimate of the population variance. However, even when 2 ρ σ =25%, this figure shows that researchers will conclude that there is no residual variance in about 1⁄4 of the metaanalyses. In other words, even with non-trivial levels of variability in the population correlation, researchers will erroneously conclude that there is no variance to explain if they recode their negative variance estimates to zero. The traditional estimator as the mode To provide better estimates when negative variance occurs, we need a better understanding of their source distributions. We ask the question, “Where could this observed score come from?” This requires us to reverse matters somewhat. Typically, we consider specific population values and then determine the distribution of possible observed scores that might be generated under different conditions, as in Figure 1. Here we present the distribution of possible source populations values that might have lead to specific observed scores. Figure 2 illustrates this distribution of possible population variances for the two observed residual variances of The Iterative Homogeneity of Variance Index 6 2 ˆ ρ σ =-7% and 2 ˆ ρ σ =-17%. These distributions represent the multitude of values from which our observed scores could have originated. As can be seen, the differences between Figure 1 and Figure 2 are extreme. Because negative variance is impossible, Figure 2’s distributions of possible populations begin at zero and move upwards. Of note, since the density for 2 ˆ ρ σ =-7% extends further to the right than the estimate of 2 ˆ ρ σ =-17%, the estimate of 2 ˆ ρ σ =-7% is more likely to occur when the true population variance is larger. In typical meta-analytic practice, both estimates of -7% and -17% would be recoded to zero. However, there is a wide continuum of possible source population variances that could have given rise to our negative observed scores and this raises a very important question. From these distributions of possible source population variances, is the absolute lowest of these values the best estimate? To determine if zero recoding is out best option, we must consider what is a good measure of central tendency. As Figure 2 indicates, we have a wide selection of possible source populations that could have given rise to out observed score and we must choose one single value to best represent these possibilities. Essentially, the traditional method of recoding negative variance to zero is using the mode of the distribution as the measure of central tendency. This is shown in Figure 2 by zero being the highest point on both distributions, meaning it is the single most likely value to occur in the absence of other information (Rice, 1995). However, for rightskewed distributions, as we have here, the use of the mode is problematic. As the Handbook of Statistical Methods (2003) indicates, “For severely-skewed distributions, the mode may be at or near the left or right tail of the data and so it seems not to be a good representative of the center of the distribution [italics added].” In other words, zero variance is the lower bound of all possible estimates, guaranteeing that the true value will almost always be higher. Since, virtually The Iterative Homogeneity of Variance Index 7 100% of all true scores will be larger than this estimate, the traditional method is considerably biased. Being a poor estimator in this situation, the mode creates an unwanted outcome. Because negative variance estimates can be obtained even in samples drawn from a population with a positive variance in effect sizes, methodology that simply recodes negative variance as zero variance is likely to indicate perfect homogeneity in situations where such an inference is unwarranted. Falsely reporting homogeneity can have dire effects. As Sackett, Harris, and Orr (1986) noted, “If the meta-analysis was not powerful enough to detect the presence of one or more moderators, the future research to detect the presence of one or moderators would most likely not be conducted in light of the results of the meta-analysis” (p. 303). Similarly, if using the mode causes us to mistakenly indicate there is no situational variability, it discourages future research as to the (potentially theoretically interesting) sources of residual variance. We review here an alternative technique that directly addresses this issue. It begins where the previous techniques end, with their estimation of 2 ˆ ρ σ , and mathematically constructs a range of possible distributions that might have lead to the negative estimates. The Iterative Homogeneity of Variance Index: Purpose and Rationale To summarize, when 2 ˆ ρ σ <0, its source population ( 2 ρ σ ) is likely to be small but greater than zero. Only with extremely large sample sizes can one state with confidence that complete homogeneity has been identified. Instead, there remains a strong possibility that the actual residual variance is positive. The challenge, then, is to create an estimate that reflects the practical size of the potential residual variance rather than focusing on the lower bound of zero. In addition, it would be desirable to have a technique that will improve the estimation of 2 ˆ ρ σ across the entire continuum of possible residual variance levels, even in cases where The Iterative Homogeneity of Variance Index 8 estimated variance is not negative. Such a method should consider only possible source population variance, as compared to the raw 2 ˆ ρ σ which assumes a full non-truncated distribution. An intuitively appealing numerical approach is an iterative estimation procedure, which we term the Iterative Homogeneity of Variance Index (IHVI). Its first step is to establish the possible source population distribution that could have produced the residual variance we in fact observe (e.g., Figure 2). With an accurate distribution, we have our choice of central tendency measures, ones that work better. When the distribution is skewed, most basic statistical texts will indicate that the mode is severely problematic, and that the mean or especially the median are more representative (e.g., Murphy & Davidshofer, 1998; Ray & Ravizza, 1988; Weinberg & Goldberg, 1990). Consequently, step two is calculating these specific estimators (i.e., the mode, mean, and median) based upon the derived distribution. We review each of these steps in turn. Determining the possible source population distribution As mentioned, Figure 2 represents the possible source population distribution for two originally negative 2 ˆ ρ σ . To use the IHVI methodology, we need to determine similar potential source population distributions for any other initial estimate of variance. Consequently, we need a way to establish the likelihood, that is the probability, that any population value is the source of our observed variance. Plotting probabilities along the y-axis and the population values along the x-axis, our target distribution is created. Unfortunately, because variance is a continuous variable, the probability of any single point along this continuous distribution is zero. We avoid this problem by applying the conventional calculus technique of taking definite integrals for numerous very small areas under a probability density function. As applied here, we start by establishing the range of possible source population values and then dividing this range into strips, each with an upper and lower The Iterative Homogeneity of Variance Index 9 bound. The probability of 2 ˆ ρ σ being generated by a population value within any strip is greater

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random-effects model for meta-analysis of clinical trials: an update.

The random-effects model is often used for meta-analysis of clinical studies. The method explicitly accounts for the heterogeneity of studies through a statistical parameter representing the inter-study variation. We discuss several iterative and non-iterative alternative methods for estimating the inter-study variance and hence the overall population treatment effect. We show that the leading ...

متن کامل

Measuring the statistical validity of summary meta‐analysis and meta‐regression results for use in clinical practice

An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Usin...

متن کامل

ناهمگنی اجزای واریانس پروتئین شیر در سطوح مختلف تولید گله- سال و تاثیر آن بر پارامترهای ژنتیکی و ارزش اصلاحی برآورد شده گاوهای هلشتاین ایران

This study was carried out to investigate different data transformation methods on homogeneity and heterogeneity of variance components. Data included 305-day lactation records for protein yield from the first three lactations of Iranian Holstein cows collected from 1983 to 2014 by the Animal Breeding Center and Promotion of Animal Products of Iran. Data included 141670 records for 1st lactatio...

متن کامل

Meta-Analysis Techniques in Medical Research: A Statistical Perspective

Meta-analysis is now commonly used in medical research. However there are statistical issues relating to the subject that require investigation and some are considered here, from both a methodological and a practical perspective. Each of the fixed effect and the random effects models for meta-analysis are based on certain assumptions and the validity of these is investigated. A formal test of t...

متن کامل

Assessing discriminative ability of risk models in clustered data

BACKGROUND The discriminative ability of a risk model is often measured by Harrell's concordance-index (c-index). The c-index estimates for two randomly chosen subjects the probability that the model predicts a higher risk for the subject with poorer outcome (concordance probability). When data are clustered, as in multicenter data, two types of concordance are distinguished: concordance in sub...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003